Skip to content

Conversation

miketheman
Copy link
Member

The API and claims generated from GitLab will be desensitized when
generating JWTs.

Instead of updating the regular expression and using the same generic
error message, create a custom validator with a custom message.

Resolves #18797

Signed-off-by: Mike Fiedler [email protected]

The API and claims generated from GitLab will be desensitized when
generating JWTs.

Instead of updating the regular expression and using the same generic
error message, create a custom validator with a custom message.

Resolves pypi#18797

Signed-off-by: Mike Fiedler <[email protected]>
Signed-off-by: Mike Fiedler <[email protected]>
@miketheman miketheman requested a review from a team as a code owner October 6, 2025 17:53
@miketheman miketheman added UX/UI design, user experience, user interface trusted-publishing labels Oct 6, 2025
@miketheman
Copy link
Member Author

I elected to update the form to reject and thus guide the user to input what becomes the ground truth value, rather than lower-case it during query time, so that we are not manipulating the ground truth, and reduce the time-to-discover a problem with the configuration.

@miketheman
Copy link
Member Author

Looks like:
image

@di
Copy link
Member

di commented Oct 6, 2025

I think if GitLab is case-insensitive, we should just be case-insensitive as well?

@miketheman
Copy link
Member Author

miketheman commented Oct 6, 2025

I think if GitLab is case-insensitive, we should just be case-insensitive as well?

I previously wrote:

I elected to update the form to reject and thus guide the user to input what becomes the ground truth value, rather than lower-case it during query time, so that we are not manipulating the ground truth, and reduce the time-to-discover a problem with the configuration.

Expanding on this - if we do lowercase at query-time, our machinery would still provide the end user a "publisher not found" message when incorrect, and then kick off the questions of "what's the ground truth configured, what did we get in the claims, etc". Whereas prompting them to input the value that we will expect to find later at the outset removes a set of concerns, especially as I look at other behaviors related to validation.

The way I see it, if there's data stored in the DB, that's the ground truth. We should store what folks give us, instead of lowercasing at insert-time, since then we modify the value they gave us, creating a mismatch of expectations.

If we store the user-supplied, case-sensitive value (like we do today), then we must remember that every time we access that value, we have to lowercase it, even during manual debugging in a SQL prompt. My memory isn't that good, so I'm preferring to have the user control the input and provide valid values and reduce potential confusion down the line.

Do you see an inherent problem with this approach?

@di
Copy link
Member

di commented Oct 6, 2025

Yeah, I think the issue is that if the user puts in caseSensitiveProject (because that's what they see displayed on GitLab and we reject it because we're being case-sensitive and overly restrictive, that's somewhat confusing:

  • If the user knows and expects GitLab to be case-insensitive, it's confusing because it's the opposite of what they know.
  • If the user doesn't know that GitLab is case-insensitive, it's confusing because it seems like we're telling them to put in something other than what their repo is named.

The way I see it, if there's data stored in the DB, that's the ground truth. We should store what folks give us, instead of lowercasing at insert-time, since then we modify the value they gave us, creating a mismatch of expectations.

I agree, we should not mutate what we've been originally given by the user before storing it.

If we store the user-supplied, case-sensitive value (like we do today), then we must remember that every time we access that value, we have to lowercase it, even during manual debugging in a SQL prompt. My memory isn't that good, so I'm preferring to have the user control the input and provide valid values and reduce potential confusion down the line.

I disagree, I don't think this is worth making users jump through extra hoops here just to do normalization for us.

I think either way when doing a manual query, we'll need to remember that there might be a difference between the repo name a user provides us and what's in the database (e.g. if they come to us with a repo named https://gitlab.com/coBib/cobib, we'll have to remember that this will be stored as cobib and not coBib in the database.

Additionally, this doesn't help any of the users that have a mis-configured publisher currently, whereas updating the query to do normalization would make their publishers work immediately:

warehouse=> SELECT count(*) FROM gitlab_oidc_publishers WHERE namespace ~ '[A-Z]' OR project ~ '[A-Z]';
 count
-------
    66
(1 row)

@ewdurbin
Copy link
Member

ewdurbin commented Oct 7, 2025

I'm in favor of us accepting whatever the user gives, more or less full-stop. this is a recipe for frustration and errors, not dissimilar from a banking app refusing to let me paste in an account number from my password manager and forcing me to type it in piece by piece and opening the door to typos.

Example: a user wants to paste in PyPI/Receiver and is refused they may very well type in pypi/reciever and introduce a typo (my favorite typo in fact).

@woodruffw
Copy link
Member

Yeah, I think PyPI should probably be permissive of casing decisions made by the user at TP enrollment/provisioning time -- it's frustrating that CI/CD services don't tend to document these details super well, but IMO the user experience is a lot better if PyPI behaves like GitHub and GitLab (which are case-insensitive-but-preserving AFAICT in this context) and allows matching regardless of how the user originally cased their inputs.

In other words, IMO we should allow uppercase values here and accept a small amount of additional complication at lookup/matching time.

(Relatedly, I think it was a mistake on my part to do a "normalization" trick for GitHub where we replace the user's given username with the one the GitHub API responds with -- similarly to here, we probably should just take whatever the user puts in as long as it's valid and honor their case choices.)

Ref:

# NOTE: Use the normalized owner name as provided by GitHub.
self.normalized_owner = owner_info["login"]
self.owner_id = owner_info["id"]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
trusted-publishing UX/UI design, user experience, user interface
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GitLab Trusted Publisher case sensitivity
4 participants